Search CORE

294 research outputs found

Towards a balanced named entity corpus for Dutch

Author: Desmet Bart
Hoste Veronique
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2010
Field of study

The GW/LT3 VarDial 2016 shared task system for dialects and similar languages detection

Author: Desmet Bart
Diab Mona
Zirikly Ayah
Publication venue: The COLING 2016 Organizing Committee
Publication date: 01/01/2016
Field of study

This paper describes the GW/LT3 contribution to the 2016 VarDial shared task on the identification of similar languages (task 1) and Arabic dialects (task 2). For both tasks, we experimented with Logistic Regression and Neural Network classifiers in isolation. Additionally, we implemented a cascaded classifier that consists of coarse and fine-grained classifiers (task 1) and a classifier ensemble with majority voting for task 2. The submitted systems obtained state-of-the-art performance and ranked first for the evaluation on social media data (test sets B1 and B2 for task 1), with a maximum weighted F1 score of 91.94%

Ghent University Academic Bibliography

Mental distress detection and triage in forum posts: the LT3 CLPsych 2016 shared task system

Author: Desmet Bart
Hoste Veronique
Jacobs Gilles
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2016
Field of study

This paper describes the contribution of LT3 for the CLPsych 2016 Shared Task on automatic triage of mental health forum posts. Our systems use multiclass Support Vector Machines (SVM), cascaded binary SVMs and ensembles with a rich feature set. The best systems obtain macro-averaged F-scores of 40% on the full task and 80% on the green versus alarming distinction. Multiclass SVMs with all features score best in terms of F-score, whereas feature filtering with bi-normal separation and classifier ensembling are found to improve recall of alarming posts

Crossref

Ghent University Academic Bibliography

Finding the online cry for help : automatic text classification for suicide prevention

Author: Desmet Bart
Publication venue: Ghent University. Faculty of Arts and Philosophy
Publication date: 01/01/2014
Field of study

Successful prevention of suicide, a serious public health concern worldwide, hinges on the adequate detection of suicide risk. While online platforms are increasingly used for expressing suicidal thoughts, manually monitoring for such signals of distress is practically infeasible, given the information overload suicide prevention workers are confronted with. In this thesis, the automatic detection of suicide-related messages is studied. It presents the first classification-based approach to online suicidality detection, and focuses on Dutch user-generated content. In order to evaluate the viability of such a machine learning approach, we developed a gold standard corpus, consisting of message board and blog posts. These were manually labeled according to a newly developed annotation scheme, grounded in suicide prevention practice. The scheme provides for the annotation of a post's relevance to suicide, and the subject and severity of a suicide threat, if any. This allowed us to derive two tasks: the detection of suicide-related posts, and of severe, high-risk content. In a series of experiments, we sought to determine how well these tasks can be carried out automatically, and which information sources and techniques contribute to classification performance. The experimental results show that both types of messages can be detected with high precision. Therefore, the amount of noise generated by the system is minimal, even on very large datasets, making it usable in a real-world prevention setting. Recall is high for the relevance task, but at around 60%, it is considerably lower for severity. This is mainly attributable to implicit references to suicide, which often go undetected. We found a variety of information sources to be informative for both tasks, including token and character ngram bags-of-words, features based on LSA topic models, polarity lexicons and named entity recognition, and suicide-related terms extracted from a background corpus. To improve classification performance, the models were optimized using feature selection, hyperparameter, or a combination of both. A distributed genetic algorithm approach proved successful in finding good solutions for this complex search problem, and resulted in more robust models. Experiments with cascaded classification of the severity task did not reveal performance benefits over direct classification (in terms of F1-score), but its structure allows the use of slower, memory-based learning algorithms that considerably improved recall. At the end of this thesis, we address a problem typical of user-generated content: noise in the form of misspellings, phonetic transcriptions and other deviations from the linguistic norm. We developed an automatic text normalization system, using a cascaded statistical machine translation approach, and applied it to normalize the data for the suicidality detection tasks. Subsequent experiments revealed that, compared to the original data, normalized data resulted in fewer and more informative features, and improved classification performance. This extrinsic evaluation demonstrates the utility of automatic normalization for suicidality detection, and more generally, text classification on user-generated content

Ghent University Academic Bibliography

TA-COS 2018 : 2nd Workshop on Text Analytics for Cybersecurity and Online Safety : Proceedings

Author: De Pauw Guy
Desmet Bart
Lefever Els
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2018
Field of study

Ghent University Academic Bibliography

Influence of grid configuration on current conducting behaviour in PV installations

Author: Debruyne Colin
Desmet Jan
Vandevelde Lieven
Verhelst Bart
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

On the roof of an industrial site a 385 kWp PhotoVoltaic installation is operational. When production of this system reaches 60% of the installed power, the circuit breaker trips. At sufficient production, measurements show a high distortion of phase voltage and variable waveform of both phase voltage and current. Analysis of the installation showed that a Yy0 transformer is used introducing a high zero sequence impedance. Unbalance in the injected current combined with a high zero sequence impedance leads to a high neutral-ground voltage and distorted phase-neutral voltages. In this paper it will be shown that the tripping of the circuit breaker is caused by the measurement method of the device. This paper analyses the practical measurement results, causes of errors and the solution to the stated problem

Crossref

Ghent University Academic Bibliography

UGENT-LT3 SCATE system for machine translation quality estimation

Author: Desmet Bart
Hoste Veronique
Macken Lieve
Tezcan Arda
Publication venue
Publication date: 01/01/2015
Field of study

This paper describes the submission of the UGENT-LT3 SCATE system to the WMT15 Shared Task on Quality Estima-tion (QE), viz. English-Spanish word and sentence-level QE. We conceived QE as a supervised Machine Learning (ML) problem and designed additional features and combined these with the baseline feature set to estimate quality. The sen-tence-level QE system re-uses the word level predictions of the word-level QE system. We experimented with different learning methods and observe improve-ments over the baseline system for word-level QE with the use of the new features and by combining learning methods into ensembles. For sentence-level QE we show that using a single feature based on word-level predictions can perform better than the baseline system and using this in combination with additional features led to further improvements in performance

Crossref

Ghent University Academic Bibliography

Archivsystem Ask23

Simultaneous interpretation of numbers and the impact of technological support

Author: Defrancq Bart
Desmet Bart
Vandierendonck Mieke
Publication venue: Language Science Press
Publication date: 01/01/2018
Field of study

Ghent University Academic Bibliography

Technical SWOT analysis of decentralised production for low voltage grids in Flanders

Author: Debruyne Colin
Desmet Jan
Vandevelde Lieven
Verhelst Bart
Publication venue
Publication date: 01/01/2012
Field of study

The increasing energy prices, combined with high funding by the government, has resulted in a massive integration of decentralised electrical energy production units in Belgium. These systems are mainly PhotoVoltaic systems and the sudden increase of both number and power ratio of the DG systems has put additional stress on the distribution network. In this paper a technical SWOT analysis is presented. The researchers believe that the solution to decompress the stress can result in additional benefits for both, end user and distribution network operators

Ghent University Academic Bibliography

Dutch named entity recognition using ensemble classifiers

Author: Desmet Bart
Hoste Veronique
Publication venue: Landelijke Onderzoeksschool Taalwetenschap (LOT)
Publication date: 01/01/2010
Field of study

Ghent University Academic Bibliography